-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Pydantic integration #3086
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Pydantic integration #3086
Conversation
4ae0574
to
891205c
Compare
1f3f66c
to
b9ada0f
Compare
f9ddddc
to
c552171
Compare
2d90d7a
to
5f41106
Compare
112b8d4
to
51be343
Compare
51be343
to
1a4822c
Compare
fe23bb4
to
8902d66
Compare
🔍 Preview links for changed docs |
8902d66
to
d8ec4fe
Compare
d8ec4fe
to
df630e3
Compare
df630e3
to
c15e7b5
Compare
c15e7b5
to
4884de8
Compare
4884de8
to
b19fa58
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! LGTM. I only have comments on the example app. I only skimmed the frontend code.
name = "quotes" | ||
version = "0.1" | ||
dependencies = [ | ||
"elasticsearch[async]>=8,<9", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"elasticsearch[async]>=8,<9", | |
"elasticsearch[async]>=9,<10", |
Are you planning to backport this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! I created the project on which this example is based for a talk I gave last year, so the versions are out of date. I will refresh.
doc = None | ||
try: | ||
doc = await Quote._doc.get(id) | ||
except NotFoundError: | ||
pass | ||
if not doc: | ||
raise HTTPException(status_code=404, detail="Item not found") | ||
return Quote.from_doc(doc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this code the same?
doc = None | |
try: | |
doc = await Quote._doc.get(id) | |
except NotFoundError: | |
pass | |
if not doc: | |
raise HTTPException(status_code=404, detail="Item not found") | |
return Quote.from_doc(doc) | |
try: | |
doc = await Quote._doc.get(id) | |
return Quote.from_doc(doc) | |
except NotFoundError: | |
raise HTTPException(status_code=404, detail="Item not found") |
This also applies to get_quote
and delete_quote
.
async def create_quote(req: Quote) -> Quote: | ||
embed_quotes([req]) | ||
doc = req.to_doc() | ||
doc.meta.id = "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this needed?
if req.query == '': | ||
s = s.query(dsl.query.MatchAll()) | ||
elif req.knn: | ||
s = s.query(dsl.query.Knn(field=Quote._doc.embedding, query_vector=model.encode(req.query).tolist())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Splitting this into two lines could help with readability:
s = s.query(dsl.query.Knn(field=Quote._doc.embedding, query_vector=model.encode(req.query).tolist())) | |
query_vector = model.encode(req.query).tolist() | |
s = s.query(dsl.query.Knn(field=Quote._doc.embedding, query_vector=query_vector)) |
Quotes database example, which demonstrates the Elasticsearch integration with | ||
Pydantic models. This example features a React frontend and a FastAPI back end. | ||
|
||
 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great quotes! Do you maybe have an example that does rely on embeddings?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand what you mean here. This example uses embeddings, both on their own and combined with BM25 search.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean, the results would have been the same with BM25 only
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So maybe there is something else than "dogs and books" that we can use. That's just a nitpick.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see, you are talking about the screenshot. In fact, "dogs and books" does not match the Groucho Marx quote at the top when using BM25, because that quote has "dog" and "book" in it in singular. I have to test this, but maybe using "canine" instead of dog makes the example more clear and returns the same results.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the purpose of this folder? This looks like a create-react-app artifact that is not actually needed here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct. This project was initially created with create-react-app
, and I now refreshed it and moved it to Vite. The public folder doesn't serve any purpose in this example, but I'm guessing Vite expects it to be there, since it serves static files off of it.
This change adds a few features that support the use of Pydantic models with the DSL module, instead of the standard models defined as subclasses of the
AsyncDocument
class.As part of this work some additions have been made to the typing implementation of DSL documents.
Annotated
syntax when defining document fields in the DSL module. Examples:New
BaseESModel
andAsyncBaseESModel
classes that inherit from Pydantic'sBaseModel
and add Elasticsearch superpowers. In particular, any model defined with one of these as its base class will havemeta
and_doc
private attributes andto_doc()
andfrom_doc()
methods. Themeta
attribute includes metadata for each document, things such asid
orscore
. The_doc
attribute is a dynamically generatedDocument
orAsyncDocument
instance that can be used whenever access to the Elasticsearch index is needed. The methods convert between Pydantic models and ES documents.Aside from the extra attributes, this class works exactly like
BaseModel
and can be used to define data attributes and their validation rules, and the ES document is derived from them automatically. In particular, this class can be used in FastAPI routes, as shown in thequotes
example included in this PR. Any annotations intended for the DSL module can be included in theAnnotated[]
type hint of the respective fields. TheIndex
inner class can be included as well.